fix: keep large LOAD external scan multi-CN#24855
Conversation
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
|
再补一个实质问题: 现在 结果就是:实际输入行里明明还有额外字段,新的统计却只按目标列宽来算, 对这类 LOAD,执行路径阈值会被更早打穿,可能把本来不该进 multi-CN 的任务提前推到 multi-CN。也就是说,这次改动把“按真实输入行宽估算”的目标,在带 |
|
看了最新更新后,前面
|
3308b8e to
1641216
Compare
1641216 to
6ea83f8
Compare
XuPeng-SH
left a comment
There was a problem hiding this comment.
最新更新把我前面关注的点都补齐了:
BlockNum边界的 off-by-one 改成了真正的ceil;- inline CSV 的行宽估算现在会把 terminator 算进去;
- file-based text load 新增了首行采样,并且已经补上了
IGNORE LINES/\r\n这些我上轮卡住的残留点; - 相关单测也跟着补到了边界和采样分支。
这版我这边看下来可以过。
What type of PR is this?
Which issue(s) this PR fixes:
issue #24846
What this PR does / why we need it:
This fixes LOAD external scan stats so large LOAD jobs keep row/cardinality semantics for
Cost,Outcnt,TableCnt, andBlockNum, while preservingCost * Rowsizeas the input-size hint used by external scan parallel sizing.Previously the LOAD stats used input bytes as
CostwithRowsize=1and forcedBlockNum=1/TableCnt=1. That can make large CSV/TBL LOAD chooseAP_ONECNinstead of the expected multi-CN AP path.Tests: